Motivations, concerns and selection biases when posting preprints: A survey of bioRxiv authors

Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same journals that were not posted. However, the exact causal mechanism for this effect has not been established, and may in part be related to authors’ biases in the selection of articles that are chosen to be posted as preprints. We aimed to investigate this mechanism by conducting a mixed-methods survey of 1,444 authors of bioRxiv preprints, to investigate the reasons that they post or do not post certain articles as preprints, and to make comparisons between articles they choose to post and not post as preprints. We find that authors are most strongly motivated to post preprints to increase awareness of their work and increase the speed of its dissemination; conversely, the strongest reasons for not posting preprints centre around a lack of awareness of preprints and reluctance to publicly post work that has not undergone a peer review process. We additionally find evidence that authors do not consider quality, novelty or significance when posting or not posting research as preprints, however, authors retain an expectation that articles they post as preprints will receive more citations or be shared more widely online than articles not posted.


Introduction
Preprints have become an integral part of the scholarly communication process. Whilst no singular authoritative definition of a "preprint" (in some disciplines referred to as a "working paper") exists [1,2], they are most commonly considered to be complete versions of scientific manuscripts that are posted to open online repositories prior to (or at the same time as) undergoing formal peer review and publication in a scientific journal, a process that can take between 9 and 18 months on average, dependent on the discipline [3]. In some scientific disciplines preprints have already become established as a norm. For example, in Physics and Mathematics, the preprint server arXiv (https://arxiv.org) has become widely established since Attention Scores than articles without a preprint, whilst Fraser et al. [20] report that articles with a preprint received 1.56 times more citations, 2.33 times more tweets, 1.55 times more blog mentions, 1.47 times more mainstream media mentions, 1.30 times more Wikipedia citations and 1.81 times more Mendeley reads than articles without a preprint. These findings of a "bioRxiv citation/altmetric advantage" are in agreement with findings based on similar studies conducted on arXiv [5, [21][22][23][24], and related studies that have investigated the more general Open Access (OA) citation advantage: a recent review of 134 of these studies found that a larger proportion (47.8%) confirm the existence of an OA advantage compared to those that find the opposite effect (27.6%), but it should be noted that the quality and heterogeneity of various studies and their designs make their results difficult to generalise [25].
Despite the large effect sizes of the bioRxiv citation/altmetric advantage reported, the aforementioned studies can still only be considered as observational in nature, i.e. it is not possible to ascribe the increase in citations or altmetrics to a causal mechanism. Key reasons are that studies may fail to include all possible confounding variables that might influence citation rates; most of the studies only account for intrinsic authorship, article or journal properties, but not for factors relating to authors' behaviour and decision-making processes that may lead to bias in the quality, and thus the "citeability" or "shareability" of preprints. This bias, previously termed the "Self-selection Bias Postulate", or alternatively the "Quality Postulate" [21,22,[26][27][28], may itself be manifested in two dimensions: (1) authors may select their highest "quality" articles (where quality, in this sense, may refer to the articles that the authors believe will generate the highest citation/online impact) to post as preprints, which would consequently be more strongly cited or shared on online platforms than their lower quality articles, or (2) higher "quality" authors may preferentially post their articles as preprints compared to lower quality authors.
In this study, we aimed to primarily investigate the mechanism underlying the first dimension of the Self-selection Bias Postulate as applied to preprints, to investigate how author's behaviours and motivations lead them to select certain articles to post as preprints. To achieve this, we conducted an online survey of bioRxiv preprint authors in which we asked them to report on their publishing and preprinting activities, with a specific focus on factors that support their decision to post or not post certain articles as preprints, and to report on differences between articles that they decide to post as preprints and those they do not (e.g. in terms of quality, novelty, or significance).

Related work
A number of previous surveys have investigated the motivations and concerns of authors when posting or not posting preprints. A survey conducted by the Association for Computational Linguistics (ACL) [29] (N = 623) found that the two strongest motivations to post preprints, supported by 80% and 70% of the respondents who had posted a preprint, were to "publicize my research as soon as I think it is ready" and to "timestamp the ideas in the paper", respectively. Only 32% of respondents reported that they have posted a preprint to "maximize the paper's citation count", implying that increasing a paper's impact ranks relatively low amongst computational linguists' motivations for preprinting. Conversely, when respondents who had not posted a preprint were asked for their reasons for not doing so, 71% reported that there "was no need when I intend to publish my papers", and 59% reported that they did not do so "to preserve the integrity of double-blind reviewing".
A survey of community members of the Special Interest Group on Information Retrieval (SIGIR) [30] (N = 159) found broadly similar results to that of the ACL survey: the two strongest motivations for posting preprints remained to "publicize my research as soon as I think it is ready" (67%) and to "timestamp the ideas in the paper" (81%), with only 48% of respondents posting preprints to "maximize the paper's citation count", whilst the most-agreed with reasons for not posting preprints were "I want to preserve the integrity of double-blind reviewing" (65%), and "I do not see the need when I intend to publish my papers at a conference or in a journal" (63%).
In the context of bioRxiv, Sever et al. [11] previously conducted a survey of bioRxiv authors, readers, and non-users to ask about their motivations for posting preprints, reasons that they had not posted a preprint on bioRxiv (if applicable), and how posting a preprint on bioRxiv has benefited their careers. They found that the strongest motivations of preprint authors (N = 3,364) were to "increase awareness of your research" (80% of authors) and "to benefit science" (69%), whilst only 54% of authors did so "to stake a priority claim on your research", a lower proportion than was reported in the respective questions of the ACL and SIGIR surveys. Unlike the ACL and SIGIR surveys, the survey of Sever et al. [11] did not specifically ask authors about the effect of posting preprints on impact indicators, e.g. citations or altmetrics. Of those who had not posted a preprint to bioRxiv (N = 844), the most frequently reported reason was that authors "do not have enough data yet for a research manuscript" (35%), followed by "you are still deciding whether posting a preprint is the right choice for you" (24%), although only 10% of bioRxiv non-authors agreed that they "are not convinced that preprints are a good idea". In terms of benefits of posting a preprint to bioRxiv, by far the most-frequently agreed with reason was that they "Increased awareness of your research" (73%)-the second most-frequently agreed with reason was that they "Helped stake a priority claim for your research" (28%).
A more recent survey conducted by ASAPbio (https://asapbio.org/), a nonprofit that advocates for "innovation and transparency in life sciences communication", also investigated the perceived benefits and concerns surrounding preprints (N = 546) [31]. The preliminary findings broadly echoed those of Sever et al. [11]: the benefit most strongly agreed with was "increasing the speed of research communication"; however, agreement was stronger amongst authors who had previously posted preprints versus those who had not. Although all of these surveys have targeted different groups at different times, some clear themes have emerged in terms of factors that support authors' decision to post preprints: they do so primarily to increase awareness of their work (i.e. increased distribution and/or readership amongst colleagues and other scientists), to increase the speed of its dissemination into relevant communities, to enable free/unrestricted access for readers, to receive more feedback, and to stake a priority claim on their research ideas, whilst other motivating factors such as increasing citation counts, or supporting career development (e.g. to cite a preprint in a job application) appear to be secondary. In contrast, factors that discourage authors from posting preprints focus mainly on concerns surrounding preprint integrity due to lack of peer-review, risks of premature reporting in the media, and of work being "scooped" when published too early. Notably, external pressures to post or not post preprints (e.g. posting due to institutional/funding policies, or not posting due to journal policies/the Ingelfinger rule, that prohibits publication of manuscripts that have been previously published or submitted elsewhere [32]) appear to rank relatively low amongst researchers' motivations or concerns when deciding whether to post a preprint.
A related study by Chiarelli et al. [1] also explored the perceived benefits and challenges of preprint posting, although their study concentrated on a range of stakeholders not limited to preprint authors (e.g. research funders, preprint servers and service providers), and used semistructured interview methods rather than quantitative surveys. However, the results largely echoed those of the previously discussed surveys: stakeholders reported the most important benefits of preprints to be early and rapid dissemination, and increased opportunities for feedback (both mentioned by >20 interviewees from a sample of 38 interviewees), whilst increasing citation counts was a less important factor (mentioned by <10 interviewees). The most important challenges of preprints were the lack of quality assurance, limited use of commenting/feedback, risk of media reporting incorrect research, possible harm in the case of sensitive areas, and questionable value of self-appointed reviewers (all mentioned by >20 interviewees). The study concludes that a one-size-fits-all approach to preprints is not feasible: different disciplinary communities are at different stages in their processes of using, adopting or experimenting with preprints, and that the future of preprint servers and related infrastructure remains unclear.

Motivation and research gap
Our survey approach contains some similarities to previous surveys that have investigated the motivations and concerns of authors to post or not post preprints, but differs in several important aspects: We ask authors about their motivations for posting or not posting preprints, with a specific focus on articles that were eventually published in scientific journals.
2. We ask authors to make direct comparisons between their published journal articles that were and were not posted as preprints.
Focusing on these missing aspects from previous surveys will allow us to partially fill the gap in understanding the Self-selection Bias Postulate: we hope to be able to understand the reasons that authors are motivated to post or not post certain journal articles as preprints, and relate these reasons to factors such as quality, novelty and expected citation performance of the articles.
Our previous work also showed that citation/online sharing rates of journal articles posted as preprints are related to various author-specific factors, e.g. higher citation rates were associated with more senior first and last authors, male first authors, and first authors from the USA [20]. We therefore aimed to distinguish differences in motivations and selection strategies of authors to post preprints between multiple demographics groups, namely the participants' country of residence (US versus non-US), gender, and career status.

Survey participants
Our pool of potential survey participants consisted of corresponding authors of preprints posted to bioRxiv between November 2013 (coinciding with the launch of bioRxiv) and December 2018. Email addresses of corresponding authors were harvested by crawling of bioRxiv public web pages using the R packages rcrossref for harvesting preprint DOIs [33] and rvest for reading and extracting HTML [34]. For each preprint, corresponding author email addresses were extracted from the HTML meta tag "citation_author_email" (note that a single preprint can list multiple corresponding author email addresses). We harvested a total of 49,447 raw email addresses associated with 23,986 preprints. Email addresses were subsequently validated through regular expressions, and duplicate email addresses were removedthese were either the result of authors submitting multiple preprints, or in a small number of cases the same email address being listed multiple times on the same preprint. Following these steps, we produced a list of 24,633 unique validated email addresses. Potential participants were invited via email to participate in the survey on 27th & 28th February 2020, and a followup invitation was sent on 25th March 2020. Responses were collected until 29th April 2020. Of 1,847 participants who began the survey, 1,444 participants fully completed it, representing a drop-out rate of 27.9% and a response rate (for fully complete surveys) of 5.9%. In all the following analysis, we only consider data from participants who fully completed the survey.

Survey development and design
A generalised overview of the survey design is shown in Fig 1. The survey was built with Lime-Survey version 2.73.1, and hosted on a server maintained by the University of Kiel, Germany. The survey was designed and survey items selected according to the authors' prior knowledge and based on a literature review of studies discussing preprint posting motivations and publishing behaviour, including those studies previously discussed [1,11,[29][30][31]. Before sending out the survey, a pilot test was conducted with a survey expert in one of our institutions (GESIS-Leibniz Institute for the Social Sciences), and survey items were adjusted according to feedback received.
In brief terms, the survey was divided into three main sections: The first section asked participants to report their demographic background (country of residence, main scientific disciplines, gender, career status and the institution type of their employer). For each question, participants were also able to opt-out of answering.
The second section asked participants to consider their recent (past 5 years) record of publishing in scientific journals as a corresponding author. With respect to journal articles published in that time period, participants were grouped according to the proportion of those articles that were also posted as preprints (authors who reported that they did not publish any journal articles in that time were excluded), and then directed conditionally to the following survey questions: authors who reported posting all of their journal articles as preprints were surveyed on their motivations for posting preprints (N = 232), authors who reported that none of their journal articles were posted as preprints were surveyed on their reasons for not posting preprints (N = 85), and authors who reported that a selection of their journal articles were posted as preprints were surveyed on both of the above areas, as well as self-reported differences between journal articles that were and were not posted as preprints (for example, whether there was a difference in the quality or novelty of preprinted versus non-preprinted articles) (N = 1,082). Please note that only authors that were listed as corresponding authors in bioRxiv were selected as survey participants; hence, the number of authors who reported that none of their journal articles has been posted as preprints might be surprising. We hypothesise that this is presumably because a) the preprints they have posted are not yet published in journals, b) they do not intend to publish them in a journal, or c) the published preprint does not fall into the suggested time frame (latest 5 years) for the question.
The third section of the survey repeated the second section in structure, but asked respondents to instead focus on their record of publishing in scientific journals as a co-author, defined in the survey as articles where the author was in any position other than the corresponding author (N = 55, 786 and 509 for authors reporting that all, some or none of their coauthored articles were posted as preprints, respectively). This division between sections 2 and 3 was designed to determine if differences arose for articles where authors had a higher degree of autonomy over the publication strategy (i.e. as a corresponding author), versus those where authors' responsibility for the publication strategy is perhaps more passive or moderated by the decisions of the leading authors.

Regression analysis
To better understand potential differences in preprinting behaviour between demographic groups (namely gender, career status and country of residence), we conducted ordinal logistic

PLOS ONE
regression for each of our survey questions that were answered on a Likert scale. To account for the high granularity of participant information, for the independent variable of "career status" participants were pooled into "Early Career" (Master's students, PhD students, Postdoctoral Researchers) or "Late Career" (Assistant/Associate/Full Professor) groups, and for the independent variable "country of residence" into "US" or "non-US" groups (this pooling is derived from previous work that has shown that~50% of first and last authors of bioRxiv preprints are from the US [20]). Participants that chose not to report certain demographic information were excluded from the regression analysis, as were participants with the "career status" of business or healthcare professionals, due to low numbers of participants from these groups. Regression analysis was limited to responses based on participants' experiences as a corresponding author (i.e. Section 2 from Fig 1). Regression analysis was conducted using the polr() function from the R package MASS [35].
All regression results are presented as exponentiated odds-ratios (OR), with 95% confidence intervals. Odds-ratios of an ordinal logistic regression model can generally be interpreted as the odds that a one-unit increase in an independent variable is associated with a higher value on an ordinal dependent variable: an OR greater than 1 indicates a positive association and an OR less than 1 indicates a negative association. In practical terms for our study (where we only use binary categorical independent variables), odds-ratios are therefore interpreted as the odds that the reported group (e.g. "early career" in the independent variable "career status") will rank an answer one-point higher on a Likert scale (e.g. "strongly agree" instead of "agree") than the comparison group ("late career").

Analysis of free-text comments
For survey questions relating to motivations, concerns and differences of preprinting behaviour in sections 2 and 3, participants were provided with the opportunity to add free-text responses to expand on their answers or add any further context. Free-text comments were coded following the general inductive approach of Thomas [36] and qualitative content analysis of Mayring [37] to reveal new and additional topics and themes relevant to the research questions: comments were initially read to gain familiarity with their contents, and then given one or several unstructured labels reflecting their contents (e.g. the comment "Preprints allowed everyone to read the manuscript when I could not publish them Open Access, due to prohibitive costs of open access fees." was initially given the labels "accessibility" and "cost"). Labels were subsequently iteratively revised, grouped and refined into~8-10 broader categories (e.g. "accessibility" and "cost" labels were subsequently grouped under the category "open science") that conveyed the core themes of the comments. A small number of responses that did not contain information relevant to the survey question asked were ignored. Labels that did not fit within any of the broader categories and occurred <10 times across all free-text responses were grouped into the category "other". A codebook was maintained during the coding process, with names, descriptions and examples of each theme. Mayring [37] highlights that the calculation of intercoder-reliability is only advised for deductive approaches, therefore all coding was conducted by the first author of this paper, NF. To validate the coding, the codebook was discussed among the authors of this paper to reach mutual agreement via discursive action [37]. A total of 1,095 comments were coded following this process: 291 related to motivations for posting preprints, 317 related to reasons for not posting preprints, and 487 to differences between articles that were and were not posted as preprints.

Data storage and processing
Data collected in this study, as well as the survey design and LimeSurvey files, are available on GitHub (https://github.com/nicholasmfraser/biorxiv_survey) and archived on Zenodo (https://doi.org/10.5281/zenodo.5255880). Raw free-text responses and participant email addresses were removed from the final archived datasets due to the potential for participant identification. Following collection of the survey data and coding of free-text responses, all subsequent analysis and visualizations were produced with R version 4.0.1 [38].

Survey participants
An overview of demographic information for survey participants is shown in Fig 2. In general terms, survey participants were biased towards those based in North America and Europe, to male participants, and to researchers based at universities.
Observed biases in the demographics of survey participants is at least partially a result of the bias in demographics of authors posting preprints to bioRxiv who were contacted to take the survey. For example, the most strongly represented disciplines of Bioinformatics, Genomics and Genetics amongst survey participants are also amongst the disciplines with the highest number of preprints [39], whilst previous studies have also found that authors from the US are overrepresented on bioRxiv [20,40], as are male authors [20]. However, there also exist some discrepancies in participant representation compared to our expectations; for example, only 12 of 1444 participants (0.8%) are from China, whilst 3.6% of last authors and 6.5% of all authors of bioRxiv preprints are from China [40].

Publishing behaviour
Participants were asked to report on their publishing behaviour over the past 5 years, including the number of articles they have published in peer-reviewed scientific journals in that time

PLOS ONE
period (both as a corresponding author and as a co-author), and the proportion of those articles that were posted as preprints (simplified to the categories of "All", "Some" or "None") (Fig 3).
Similar patterns of publication behaviour were found for corresponding authorships and co-authorships-the most frequent category for both groups was 3-5 publications (28.0% for corresponding authorships vs 25.8% for co-authorships). At the extremes, a higher proportion of participants reported that they had published zero articles as a co-author compared to as a corresponding author (6.5% vs 3.1%), and a higher proportion reported that they had published 21+ articles as a co-author compared to as a corresponding author (14.0% vs 11.2%). In terms of preprinting behaviour, the majority of participants reported that they only posted some of their articles as preprints during that time period both as a corresponding author (77.3%) and as a co-author (58.2%), whilst a higher proportion of participants reported posting all of their articles as a corresponding author as preprints (16.6%) compared to their articles as a co-author (4.1%).

What motivates researchers to post preprints?
Survey participants who reported that they had posted all or some of their recent journal articles as preprints (Fig 3) were presented with questions that focused specifically on this set of articles. Questions covered three main focus areas: decision-making (who was responsible for the decision to post a preprint) (Fig 4 and Table 1), motivating factors (what internal/external factors made the authors want to post a preprint) (Fig 5 and Table 2), and the benefits received in terms of article citation/online impact (Fig 6 and Table 3). Participants were also provided with a free-text area to expand on their answers for this section or add further reasoning (Table 4). In terms of decision-making (Fig 4), participants overwhelmingly reported that the decision to post their articles as preprints was the free decision of the authors (92.6% agreed/ strongly agreed for corresponding authorships, 87.4% for co-authorships), and was usually not externally driven by open access/preprint policies of their institutions (78.9% disagreed/ strongly disagreed for corresponding authorships, 63.4% for co-authorships) or funding agencies (77.2% disagreed/strongly disagreed for corresponding authorships, 63.5% for co-authorships). For articles where participants acted as the corresponding author, a higher proportion reported that they themselves suggested to post an article as a preprint (75.6% agreed/strongly disagreed) than their co-authors (28.0% agreed/strongly agreed); the effect is reversed for articles where participants acted as a co-author, where a higher proportion reported that their coauthors made the suggestion (49.2%) compared to making the suggestion themselves (42.9%).
Regression analysis for the set of questions presented in Fig 4 ( Table 1) show that non-US authors were less likely to have made the decision to post preprints freely (OR: 0.757, 95% CI: 0.570-0.999) compared to their US counterparts. Early career researchers were less likely to Following questions surrounding decision making, survey participants were asked about their own motivations to post preprints, with potential motivating factors being to increase awareness of their work, to claim priority over results, to benefit the scientific enterprise, to increase the amount of feedback received, or to increase rate of dissemination (Fig 5). In general, participants reported that all of these factors positively motivated them to post preprints; the strongest motivator was to share their findings more quickly (91.5% agreed/strongly agreed for corresponding authorships, 82.7% for co-authorships), followed by increasing awareness of

PLOS ONE
their research (86.9% agreed/strongly agreed for corresponding authorships, 79.5% for coauthorships). Receiving more feedback on their work was the weakest motivator overall (62.8% agreed/strongly agreed for corresponding authorships, 58.5% for co-authorships), yet remained a motivating factor for the majority of authors. In general, motivating factors do not appear to differ strongly between articles that authors acted as a corresponding author versus those where they acted solely as a co-author.
Regression analysis for the set of questions presented in Fig 5 (Table 2) show that early career researchers are motivated more strongly to post preprints to increase awareness of their research (OR: 1.317, 95% CI: 1.030-1.687) and to receive more feedback on their work (OR: 1.367, 95% CI: 1.079-1.732), but less likely to post preprints to stake a priority claim on their work (OR: 0.668, 95% CI: 0.528-0.846). Non-US authors were less likely to post preprints to increase awareness of their research (OR: 0.679, 95% CI: 0.524-0.878), to benefit the scientific enterprise (OR: 0.505, 95% CI: 0.392-0.651) or to share their findings more quickly (OR: 0.711, 95% CI: 0.543-0.927). These results are in agreement with results from the previous section, where non-US authors reported less freedom in decision-making over posting their preprints, and posted preprints more often to comply with institutional open access/funding policies; non-US authors may therefore be less motivated by external factors when deciding whether to post preprints or not.
The final questions in this section of the survey related to the potential benefits of preprints, in terms of increased citations and other forms of online dissemination (e.g. sharing on social media). The largest group of participants responded neutrally with respect to the effect of posting preprints on citations (50.6% neutral for corresponding authorships, 48.9% for co-authorships), although a larger proportion of participants agreed/strongly agreed that posting preprints had positive benefits on citations (38.7% for corresponding authorships, 42.5% for co-authorships) than disagreed/strongly disagreed (10.6% for corresponding authorships, 8.6% for co-authorships). In contrast, the majority of participants reported positive effects of  Table 3. Results of ordinal logistic regression for survey questions in Fig 6 (corresponding authorships only;  N = 1146). Values are presented as exponentiated odds-ratios, with the range in parentheses representing 95% confidence intervals. Significant results at the 95% level are indicated in bold and with " � ".

PLOS ONE
posting preprints on other forms of online dissemination (66.7% agreed/strongly agreed for corresponding authorships, 64.5% for co-authorships), suggesting that participants believe posting preprints has a stronger effect on other forms of online dissemination than on their citation impact.
Regression analysis for the set of questions presented in Fig 6 (Table 3) show that early career researchers are more likely to report positive benefits of posting preprints both in terms Table 4. Classification of free-text responses, in response to the question "Were there any additional reasons that motivated you to deposit these articles as preprints?". Categories are ordered alphabetically (except for "Other"). Percentages for corresponding and co-authorships refer to the proportion of all responses containing the relevant category; responses could contain multiple categories (N = 253 comments on corresponding authorships, 38 comments on co-authorships). In addition to the above questions, respondents were provided with a free-text box in this section of the survey, in which they could write comments to elaborate on their motivations for posting their articles as preprints (Table 4). Free-text responses were coded into categories that represented the core themes of the free-text comments (plus a category "other" for comments that did not fit into one of the main categories). Note that one comment could belong to several categories.

Category
Results from the free-text responses showed that career development is also an important motivation for submitting preprints (e.g. "I was on the job market, so posting preprints was a way to show employers that I was productive and had products in publishing stages even though they weren't yet in print.") for articles that participants served as corresponding authors (36.8% of all free-text responses mentioned this theme) and co-authors (29.7%). This includes submitting preprints to show evidence in job or grant applications, or simply to add research outputs to a CV that do not intend to be submitted to a journal. Other factors broadly mirrored those from the quantitative survey results: the next most-discussed theme was surrounding open science, i.e. posting preprints to support a free, unrestricted scientific system (e.g. "Just seems the obviously correct thing to do, given that I basically want as many people to have (free) access to and read my work as possible."; 26.5% for corresponding authorships; 29.7% for co-authorships) and increasing the speed at which research is available (e.g. "Move science forward quickly. We want to get our methods/findings/ideas out as quickly as possible so people can build off of them."; 17.8% for corresponding authorships; 13.5% for co-authorships). Many participants also discussed the themes relating to editorial processes, primarily expressing dissatisfaction with the current journal publishing system, including biases and a tendency to favour novelty over quality in peer review (e.g. "Our, subjective, assessment was that our manuscript was being intentionally delayed/blocked in the peer-review process. Wanting to reach out with an important and timely message made us investigate the option of posting a preprint."; 12.6% for corresponding authorships; 10.8% for co-authorships). The potential to increase citation/ online impact of work via preprints was also a well-discussed theme (e.g. "I think that submitting some manuscripts as a preprint is a good idea because this will increase the possibility of citations before publishing in the scientific journals."), although this was more common for articles that authors served as a corresponding author (20.2%) than as a co-author (5.4%). Other themes that motivated participants to post preprints were related to potential competition, receiving feedback, and policies, although all of these themes were discussed by less than 10% of participants for both corresponding authorships and co-authorships.

Why do some authors not post preprints?
The previous section investigated factors that motivate authors to post their articles as preprints. In this section we explore the opposite view-factors that cause authors to not want to post their articles as preprints. In this section, survey participants who responded that they posted either none or only some of their previously published journal articles as preprints (at the level of a corresponding author, and as a co-author) were presented with questions that focused specifically on the set of articles that were not posted as preprints. Results for the survey questions are shown in Fig 7, and regression results in Table 5; as with the previous section, participants were also provided with a free-text area to expand on these answers or add further reasoning (Table 6).
Overall, we found the strongest reason for authors to not post work as preprints to be that they were simply unaware of the option at the time that the work was conducted (44.4% agreed/strongly agreed for corresponding authorships, 30.1% for co-authorships). For the remainder of the questions, participants broadly disagreed with our potential reasons for not posting articles as preprints-in particular, participants disagreed/strongly disagreed that they were not allowed to post their articles as preprints (83.5% for corresponding authorships, 73.5% for co-authorships) or that posting a preprint would have had a negative effect on their work (80.2% for corresponding authorships, 73.8% for co-authorships). Interestingly, despite previous studies suggesting that the lack of quality assurance and premature reporting of incorrect/controversial results by the media are challenges for the establishment of preprints [1,31], in our results the majority of authors disagreed/strongly disagreed that their results were controversial and needed to be evaluated by peer-review before being published (79.8% for corresponding authorships, 74.2% for co-authorships).
Regression analysis for the set of questions presented in Fig 7 are shown in Table 5. Female authors were more likely to report that they were unaware of the option to post preprints at the time their articles were published (OR: 1.668; 95% CI: 1.307-2.133)-the same was also true of early career researchers (OR: 1.383; 95% CI: 1.067-1.794) and non-US based

PLOS ONE
researchers (OR: 1.406; 95% CI: 1.090-1.815). Early career researchers were also less likely to report that they did not post preprints because they did not want anyone else to see their preprint and publish before them (OR: 0.648; 95% CI: 0.495-0.847)-these results agree well with the previous section on motivations for posting preprints, where early career researchers were less likely to report that they post preprints to stake a priority claim on their findings, indicating overall that early career researchers are less motivated in their preprint publishing activities by competition compared to more senior scientists.
A summary of free-text responses elaborating on this section of the survey are shown in Table 6. Free-text responses confirmed some aspects of the qualitative survey results described above: participants reported most strongly that the reason for not posting articles that they served as corresponding author for was simply due to historic reasons, i.e. due to a lack of awareness of preprint servers or lack of existence of preprint servers at the time their journal articles were published (e.g. "Before 2018, I was not familiar with preprint servers, and didn't feel confident posting preprints simply because I wasn't sure what to expect"; 26.3% of corresponding authorships). Other important themes discussed regarding corresponding authorships were reluctance to post preprints prior to receiving feedback and quality control through journal peer review processes (e.g. "To prevent people reading the paper before it had gone through rigorous peer review. One example would be the case of a paper with clinical findings. I wanted the manuscript to go through rigorous review before making it public in case we had made an error or unintentionally misrepresented some aspect of our findings."; 22.3% of corresponding authorships), article types that were not allowed or not suitable to be posted as preprints (e.g. "In the field of taxonomy, I have the feeling that preprints may cause nomenclatural confusion. Therefore I would not opt for preprint deposition for alpha-taxonomic work, which consists part of my research."; 17.3% of corresponding authorships), and the extra labour and time required to format and submit preprints in addition to the journal formatting and submission process (e.g. "Didn't have time. Although it's pretty easy, long author lists still take time to enter on bioRxiv."; 16.8% of corresponding authorships). Other lesser-discussed themes  7 (corresponding authorships only;  N = 1021). Values are presented as exponentiated odds-ratios, with the range in parentheses representing 95% confidence intervals. Significant results at the 95% level are indicated in bold and with " � ".

Survey Question Female Early Career non-US
I was unaware of the option to post preprints at that time (1.307-2.133) (1.067-1.794) (1.090-1.815) The journal(s) I wanted to submit to did not allow prior posting of preprints  Table 6. Classification of free-text responses, in response to the question "Were there any additional reasons that caused you to not deposit these articles as preprints?". Categories are ordered alphabetically (except for "Other"). Percentages for corresponding and co-authorships refer to the proportion of all responses containing the relevant category; responses could contain multiple categories (N = 180 comments on corresponding authorships, 137 comments on co-authorships).

PLOS ONE
relating to corresponding authorships centred on journal policies that do not allow posting of preprints, disagreement from co-authors who did not want to post preprints, the lack of accessibility benefits when the article will be published in an open-access journal anyway, and the potential for preprints to lead to "scooping" of results by competing groups. Although the latter factor was only discussed in a minority of comments (6.1% of corresponding authorships), it is interesting to note the conflict between authors who report that they post preprints as a method to prevent scooping and give them a competitive advantage, as noted in the previous section on preprint motivations, and authors who report that they do not post preprints to prevent scooping and retain their competitive advantage. The results suggest differences between groups in their perceptions of academic credit-one group are concerned that they will not receive credit as being the first researchers associated with a new finding, whilst the second group is concerned with receiving credit in the form of journal publications and do not want a potential competitor to be able to reproduce the work and receive this credit before them. With respect to co-authorships, by far the most discussed reason for not posting articles as preprints related to co-author preference (66.9% of free-text comments), underlining the importance of the corresponding author in deciding the publication strategy of an article.

Differences between posted and non-posted preprints
Survey participants who only reported that they posted some of their previously published journal articles as preprints (Fig 3) were also asked to report on the differences between articles that they posted as preprints, and those that they did not. Results for the survey questions are shown in Fig 8, regression results in Table 7 and additional free-text responses in Table 8.
Participants were initially asked to report on differences between articles in terms of article quality, novelty, societal value/significance, the impact factor of the publishing journal, as well as on their expectations for how articles would be received in terms of citations and other forms of online dissemination. Overall, a larger number of participants disagreed that articles they posted as preprints were higher quality (43.3% disagreed/strongly disagreed for corresponding authorships, 47.3% for co-authorships), more exciting/novel (39.6% disagreed/ strongly disagreed for corresponding authorships, 44.0% for co-authorships), or had higher societal value/significance (42.1% disagreed/strongly disagreed for corresponding authorships, 47.4% for co-authorships). Participants also mainly disagreed that the articles they submitted as preprints were published in journals with higher impact factors than those not posted (44.1% disagreed/strongly disagreed for corresponding authorships, 48.7% for coauthorships).
Despite these results showing that authors did not preferentially post their highest quality, most novel or significant articles as preprints, authors still expected their articles to perform well in terms of impact: 43.5% of corresponding authorships and 40.2% of co-authorships agreed/strongly agreed that they expected their articles they posted as preprints to receive more citations than the ones they did not posted, with the proportions rising to 69.3% and 60.1% for corresponding authorships and co-authorships, respectively, that expected articles they posted as preprints to be disseminated more widely online than those not posted. Authors thus retain the expectation that their articles will perform better in terms of citations and altmetrics despite no changes in quality/novelty/significance, perhaps as a result of their knowledge of prior research showing the link between preprints and citations/altmetrics [e.g. [18][19][20], or that they expect wider and earlier dissemination in their communities as a result of posting a preprint regardless of the article's quality/novelty/significance.
With respect to our regression analysis, overall we find little difference in reported results between different demographic groups, although female authors were less likely to report that they post articles as preprints that were expected to have a greater societal value/significance than those that were not posted (OR: 0.687, 95% CI: 0.530-0.889) ( Table 7).
Free-text responses (Table 8) detailed a range of additional differences that authors report between articles they do and do not post as preprints. With respect to corresponding authorships, 38.8% of respondents pointed to history/awareness of preprints as a reason for only posting a proportion of their articles as preprints, i.e. articles that they did not post as preprints were simply due to not knowing about preprints, or lack of establishment of preprints as an accepted norm at the time that they published their non-posted articles (e.g. "the main reason for not publishing on a preprint server was that this was not very common in the field (Biology/  8 (corresponding authorships only;  N = 953). Values are presented as exponentiated odds-ratios, with the range in parentheses representing 95% confidence intervals. Significant results at the 95% level are indicated in bold and with " � ".

Survey Question Female Early Career non-US
The articles that were posted as preprints were of a higher scientific quality than those that were not posted  Table 8. Classification of free-text responses, in response to the question "Were there any other differences between the articles you chose to deposit as preprints versus those you did not deposit, or differences in your expectations for how the articles would be received in your community?". Categories are ordered alphabetically (except for "Other"). Percentages for corresponding and co-authorships refer to the proportion of all responses containing the relevant category; responses could contain multiple categories (N = 394 comments on corresponding authorships, 93 comments on co-authorships). Competition Articles were deposited to claim priority over a research area or to prevent scooping. Articles were not deposited to prevent scooping or to prevent other groups publishing competing follow-up studies more quickly.

Category
"The articles that were not deposited were not done some because we were worried that we would be scooped before publication. We therefore did not put these as preprints." "Very few were deposited as preprints. I/we deposited preprints in cases where we feared we might get scooped." 13.2 4.3

Editorial process
Articles where deposited when difficulties during the review process arose, when articles were rejected, when articles were not expected to change significantly during the review process, when encouraged by a journal or editor. Articles were not deposited when the review process was expected to be fast, when an article was expected to change significantly during the review process.
"The one we chose to deposit was proving harder to publish than usual-we had to submit to several journals." "Articles deposited in preprint were not expected to evolve much during the revision. Articles not deposited in preprint were those for which I expected possibly significant changes during the revision, or additional work to strengthen the conclusions." 10.2 2.2

History/ awareness
Articles were deposited when authors were aware of/ had knowledge of preprinting, when preprints had gained acceptance in their field, when preprint infrastructure existed. Articles were not deposited when authors were unaware of/lacked knowledge of preprints, when preprints were not an accepted practice in their field, when preprint infrastructure did not exist. biophysics) a few years ago"), Other factors that drove corresponding authors to post certain papers were related to co-author preference (e.g. "a major criterion was on whether all authors agreed on posting them as preprint. As soon as one author objected, they were not posted"; 15.5% of corresponding authorships), speed (e.g. "The single article posted as a preprint was posted because it was urgent to do so. The data were useful for Endangered Species Act listing as well as for policies on invasive species mitigation. The publication process was taking too long and we needed the data publicly available quickly"; 13.5% of corresponding authorships), and competition (e.g. "The articles posted as preprint [sic] were more at risk of been scooped"; 13.2% of corresponding authorships). With respect to competition, we found conflicting reasoning for posting or not posting certain articles as preprints: some respondents considered preprints as a way to gain a competitive advantage by claiming priority over a research finding ("Articles posted were in possible competition, so ensured to be out first"), whilst others regarded the posting of preprints a competitive disadvantage ("Other novel results were not posted [. . .] to protect from competing groups to overtake with follow-ups before we could finish ours"). In terms of coauthorships, by far the most important factor that determined whether certain preprints were posted was co-author preferences (e.g. "For articles where I was not corresponding author I usually had a much more limited role, and I did not take part in deciding where to submit the articles (as preprints or for journals)"; 50.5% of co-authorships): these results are in good agreement with our previous section on decision-making for preprint posting (Fig 4), where the majority of corresponding authors reported that they made the decision themselves to post an article as a preprint.

Discussion and conclusions
We used a mixed-methods survey approach to investigate the motivations, concerns and selection biases of authors in posting preprints to bioRxiv, a preprint server for the biological sciences. These results are important to inform bibliometric studies aiming to better understand the causal effects of preprint publishing models on citations or other metrics of dissemination (i.e. altmetrics), as well as to better understand the diffusion of preprints into different communities, which may further inform future policies of scholarly stakeholders including researchers, institutions, funders and providers of preprint infrastructure. At the basic level, our results confirm results from previous surveys and interviews that have investigated the factors that motivate and prevent authors from posting preprints [1,11,[29][30][31]: we find the strongest motivations to post preprints are to increase awareness of research and share findings more quickly (Fig 5), although we find some differences between demographic groups, e.g. early career researchers appear to be more strongly driven to post preprints to increase awareness of their research and to receive feedback compared to late career researchers, who were more strongly motivated to post preprints to stake a priority claim on their work ( Table 2). An interesting aspect of these results, is that whilst survey participants were generally highly motivated to post preprints to increase awareness of their research, agreement was less strong on whether posting preprints actually had a positive benefit in terms of citations and/or metrics of online sharing (Fig 6), e.g. only 38.7% of authors agreed/strongly agreed that posting preprints had a positive effects on citations to their work. Such findings are highly relevant for studies aiming to understand the "true" citation effect of posting preprints. Qualitative results derived from free-text comments ( Table 4) supported the quantitative survey results, but added additional dimensions and reasons that were not previously captured, e.g. the high importance of preprints for career development and high levels of dissatisfaction with current journal publishing systems.
We additionally investigated the factors that cause authors to not publish articles as preprints, revealing a mixture of structural and self-determined reasons that preprints were not posted (Fig 7 and Tables 5 and 6). In particular, both our quantitative and qualitative results show that lack of awareness of preprints plays the most important role in their posting, with other important factors including hesitation to publish work that has not undergone peer review, the extra time and effort required to format and submit preprints, and the discouragement/disallowment of publishing certain types of articles as preprints (e.g. bioRxiv does not allow publishing of review articles on their platform). Although the so-called "Ingelfinger rule" is often mentioned as a factor that discourages authors from posting preprints, we found that journal policies only affected a minority of participants decisions to post preprints, in line with findings from a similar survey from ASAPbio [31] that found that external pressures rank relatively low on researchers' motivations and concerns. A small number of participants also reported that confusion in whether a journal would or would not accept a preprint discouraged them from posting a preprint, which may be rooted in the high proportion of journals that still do not have clear preprint policies [41].
A novel aspect of our survey in comparison to previously-mentioned studies is that we directly asked authors who have only posted a subset of their published journal articles as preprints about the differences between those they did and did not post. From the quantitative part of our survey results (Fig 8), we found that participants generally disagreed that articles chosen and not chosen to be posted as preprints differed in their quality, novelty or societal value/significance. Despite this, 43.5% of participants agreed/strongly agreed that they expected that their articles posted as preprints would receive more citations than those not posted as preprints, and 69.3% agreed/strongly agreed that they expected that their articles posted as preprints would be shared more widely online than those not posted as preprints.
These results appear to be in conflict with the first dimension of the Self-Selection Bias postulate, as previously discussed (i.e. that authors preferentially select their highest quality/most novel/most significant to post as preprints),-the majority of authors do not appear to be actively selecting articles to post as preprints based on subjective criteria. Since the survey is based on self-reported responses, it may be worth noting here that there is no stipulated benchmark for quality, significance or novelty. This may have introduced nuance or subjectivity in how the question around novelty, significance or quality has been interpreted. However, a question therefore remains at why authors still expect their articles to receive more citations/ be shared more widely online even if they do not believe those articles to differ in terms of quality/novelty/significance-the answer may lie in their motivations for submitting preprints in the first place, i.e. to increase awareness of their work and share findings more quickly, which could be expected to drive more citations and increase metrics of online sharing.
In summary, our results suggest that previous findings of a citation/altmetric advantage for bioRxiv preprints [18][19][20] are not strongly influenced by selection biases of preprint authors. However, we do not rule out that biases exist amongst the authors of preprints themselves, e.g. preprints may be preferentially authored by those "highest quality" researchers, or researchers who tend to work on more novel subject areas where early dissemination of results has a stronger effect than other less novel areas. In the current study we have not investigated this additional dimension of the Self-Selection Bias postulate, and encourage future studies in this area.

Limitations and future work
We acknowledge a number of important limitations of our study, which may be improved and built upon in future studies. Most importantly, we use a survey approach which requires selfreporting from respondents on their preprinting activities. Thus, we must make the assumption that any responses are reported honestly and without bias. During the survey we asked authors about their own potential biases in selecting articles that they have previously posted as preprints, versus those that were not posted as preprints.
However, in distributing our survey we may also introduce bias amongst survey participants-firstly that we have targeted primarily corresponding authors of preprints on bioRxiv (as we collected email address of bioRxiv preprint authors), and secondly that those who responded to the survey are potentially authors who are more engaged with preprints in general. Future studies may therefore consider targeting authors of journal articles who have never posted preprints, to determine if differences exist for authors who are less engaged with, or knowledgeable about, preprints. Whilst we have made efforts to understand the demographics of our survey respondents and the influence of these different groups on our survey results, there exist some biases in our sample (e.g. the overrepresentation of US-based and male authors) that mean that we should be cautious about generalising findings to the wider scientific system. Two limitations are related to the time period during which the survey was online: our survey was conducted in March and April 2020, and targeted authors of preprints that were posted between November 2013 and December 2018. The use and visibility of preprints has been growing in recent years, and the last two years in particular has seen a surge of preprints published in response to the COVID-19 pandemic [16]. Along with it, authors might have become more aware of or more interested in preprints which may have affected the sample composition and the response rate (we hypothesise a positive effect on the response rate). At the same time, during the survey the first lockdowns were in place in different countries which may again have introduced biases to the sample (e.g. in terms of gender, locations) and may have impacted the response rate (we hypothesise a negative effect on the response rate in this case).
Long-term monitoring and/or future replication of our study will be necessary to understand how authors' preprinting behaviour evolves over time, and what the long-term effects of "shocks" to the scientific publishing system, such as COVID-19, will have on the future development and usage of preprints. One promising avenue for future research is a stronger focus on disciplinary differences to study the effects of being e.g. at different stages of use and experimentation with preprints and the decision-making of posting preprints. There might also be a difference in the perception of quality, significance, and novelty of a paper which depends on a discipline's familiarity with preprint servers. Further, it would be interesting to better investigate the role of discipline and familiarity with preprint servers in the "scooping paradox" which we have detected in the free-text answers of our survey: some authors choose to preprint as protection from scooping while others choose not to preprint due to fear of being scooped, which may be a consequence of the perception of using preprints in different disciplines.
Lastly, we focus our study on a single preprint server for biological sciences. We encourage additional studie http://journals.plos.org/plosone/s/file?id=c819/plos-one-author-formattingchecklist.docx http://journals.plos.org/plosone/s/file?id=c819/plos-one-author-formattingchecklist.docxs that focus on, and compare our results with, both other servers targeting preprint publishing from the field of biology and other research disciplines where the available preprint infrastructure (e.g. discipline-specific preprint servers) and usage amongst researchers differs. This would provide valuable insights into intra-discipline-and inter-discipline-preprint posting behaviour as well as the role of self-deposition vs. deposition offered by publishers at the stage of journal submission.